I'm pretty happy with this solution. It's not perfectly optimal but there aren't any obvious improvements and it's nicely 1 input per tape loop. The biggest thing is how the main product is assembled, I suspect that there's some way to do it much more efficiently, arm 5 has so many instructions.